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ABSTRACT 


Many  optimization  algorithms  generate,  at  each  iteration,  a pair  ( ) consisting  of  an 
approximation  to  the  solution,  Xj^,  and  a Hessian  matrix  approximation,  , which  contains  local 
second  order  information  about  the  problem.  Much  is  known  about  the  convergence  of  the  Xj^  to  the 
solution  of  the  problem  but  relatively  little  about  the  behavior  of  the  sequence  of  matrix 
approximations.  We  analyze  the  sequence  { } generated  by  the  extended  Broyden  class  of  updating 
schemes  independently  of  the  optimization  setting  in  which  they  are  used,  deriving  various  conditions 
under  which  convergence  is  assured  and  delineating  the  structure  of  the  limits.  Rates  of  convergence 
are  also  obtained.  Our  results  extend  and  clarify  those  already  in  the  literature. 


Key  words:  optimization,  quasi-Newton  algorithms,  matrix  updates,  convergence  analysis 
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1.  Introduction 


In  this  paper  we  investigate  a general  class  of  matrix  updating  schemes  that  are  of  major  interest  in 
optimization  problems,  both  constrained  and  unconstrained.  We  consider  the  possible  convergence  of 
sequences  generated  by  these  schemes,  attempting  to  isolate  the  crucial  factors  that  determine 
convergence  independent  of  any  particular  optimization  setting.  The  results  presented  thus  subsume 
and  clarify  some  recent  results  on  convergence  of  Hessian  approximations  in  specific  optimization 
problems. 


In  many  iterative  numerical  optimization  algorithms  each  iteration  requires  the  solution  (or  partial 
solution)  of  an  approximate  quadratic  model  of  the  problem  under  consideration.  Each  surrogate 
model  is  usually  characterized  by  the  current  iterate,  Xj^ , and  a symmetric,  often  positive  definite, 
matrix,  . The  solution  of  the  quadratic  model  yields  a step,  Sj^ , from  which  the  next  iterate, 
~ \ obtained.  Once  the  new  iterate  is  located  a new  quadratic  model,  i.e.,  a new 

matrix,  is  chosen.  The  choice  is  determined  by  a particular  formula  involving  the  information 

on  hand,  including  the  preceding  matrix  , the  step  , and  the  values  of  the  problem  functions  at 
Xj^  and  The  success  of  the  algorithm  is  known  to  be  sensitive  in  many  cases  to  the  method  by 

which  the  new  matrix  is  chosen;  in  particular,  the  best  results  are  to  be  expected  when  the  quadratic 
model  closely  reflects  the  important  features  of  the  underlying  problem  as  the  solution  is  approached. 
For  this  reason  the  convergence  properties  of  the  sequence  { } are  of  significant  interest.  This  paper 

is  devoted  to  the  study  of  these  properties. 


We  examine  an  iteration  scheme  of  the  form 


(1.1) 


%+l  - ®k’ 


where  the  are  N x N symmetric  matrices,  the  <t>^  are  scalar  parameters,  the  are  given  vectors 
in  the  y^^  are  vectors  which  depend  on  the  Sj^  in  some  way  to  be  specified,  and  the  function  U is 
the  update  formula.  Of  particular  interest  will  be  how  the  convergence  (and  rate  of  convergence)  of  the 
sequence  { } depends  on  the  choice  of  the  the  initial  matrix  Hq,  and  the  sequence  {sj^}  for  a 
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given  U . 

The  primary  motivation  for  this  study  is  the  use  of  these  updates  in  those  optimization  problems  where 
the  function  U adds  a rank-two  matrix  to  the  current  matrix  in  such  a way  that  the  secant 
condition 


(1-2)  S|j  _ 

is  satisfied  for  each  k.  The  best  known  example  of  the  use  of  this  updating  method  takes  place  in 
quasi-Newton  algorithms  for  solving  unconstrained  optimization  problems  where  the  matrix  is  an 
approximation  to  the  Hessian  of  the  objective  function.  In  this  case  Sj^  is  the  step  generated  by 

(1.3)  \ ~ ^k'^ 

and 


(1.4) 


Here  f is  the  objective  function,  Xj^  is  the  current  iterate  and  is  a step  length  parameter.  Similar 
updating  schemes  are  also  used  in  trust  region  methods  for  the  unconstrained  problem  and  for  the 
sequential  quadratic  programming  algorithms  designed  to  solve  constrained  optimization  problems. 


The  most  commonly  used  class  of  updating  functions  in  these  cases  is  the  so-called  “extended  Broyden” 
class  where 


t 


(1.5) 


^k  \ ^k  \ ^k  ®k 


, / t TT  \ / ^k®k  . / ^k  ^k®k  \L 

- ( -t  - ( -r • 


^k^k  ®k  Vk  ^k  ^k  ®k'Vk 


S,  Si/H-S, 


The  special  cases  = 0 and  = 1 correspond  to  the  well-known  DFP  and  BFGS  update  formulas 
respectively  while  the  values  of  (j)^  £ [0,1]  give  rise  to  the  “restricted  Broyden  class”  of  updates. 
There  is  also  a similar  set  of  updates  in  which  Sj^  and  y^^  are  interchanged;  these  updating  schemes 
yield  approximations  to  the  inverse  Hessian.  (See  Dennis  and  Schnabel  [1]  or  Fletcher  [2].) 
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There  has  been  a significant  amount  of  research  on  the  convergence  of  the  matrices  generated  by  these 
updating  schemes,  most  generally  for  the  case  described  above  in  which  each  Sj,  is  a step  in  an 
optimization  algorithm.  For  example,  Schuller  [3],  Ge  and  Powell  [4],  and  Stoer  [5]  have  determined 
conditions  under  which  the  sequence  of  matrices  converge  in  the  setting  of  variable  metric  algorithms 
where  each  is  given  by  (1.3).  More  recently,  Byrd,  Nocedal,  and  Yuan  [6]  have  studied  the  Broyden 
class  of  updates  seeking  to  explain  the  observed  superiority  of  the  BFGS  update  over  the  DFP  update 
in  quasi-Newton  algorithms.  Conn,  Gould,  and  Toint  [7]  have  analyzed  the  special  case  of  (1.5)  where 
^k’ ^k’ ®k’ ^ symmetric  rank  one  update.  More  significantly,  they  do  not  require  that  Sj^  be 

generated  by  (1.3)  but  only  that  it  represent  the  step  . (However,  they  do  take  the  vector 

to  be  the  difference  in  the  gradients  at  and  Xj^.)  As  they  point  out,  this  more  general 

formulation  allows  the  analysis  to  include  the  important  trust  region  methods  as  well  as  large  scale 
methods  that  use  partitioned  matrices. 

The  results  presented  herein  are  in  the  spirit  of  the  approach  in  [7]  in  that  the  are  not  required  to 
satisfy  (1.3);  however,  we  go  a step  further  and  disassociate  them  from  any  particular  optimization 
problem,  although  the  setting  is  motivated  by  the  unconstrained  optimization  problem.  We  will 
establish  the  convergence  of  { } under  only  minimal  restrictions  on  { } that  are  not  related 
directly  to  its  being  generated  by  an  optimization  process.  Moreover,  we  will  not  require  the  yj,  to 
satisfy  (1.4)  but  rather  that  they  be  related  to  the  in  a certain  way  that  includes  (1.4)  as  a special 
case.  In  spite  of  the  less  restrictive  assumptions  made  on  the  parameters,  some  of  the  properties  of  the 
limit  of  the  sequence  { } are  derived  under  various  conditions  on  the  sequence  }»  thus  clarifying 

the  results  in  [3]  — [7].  Finally,  our  results  can  be  applied  to  most  of  the  common  rank-two  updating 
schemes  that  have  been  used  in  unconstrained  optimization. 

In  Section  2 the  nuances  of  the  problem  to  be  solved  are  discussed  and  a simplified  framework  in  which 
to  analyze  the  convergence  is  introduced.  Specifically,  we  consider  the  class  of  updates  as  nonlinear 
perturbations  of  a single  linear  update  which  derives  from  setting  = 0 in  (1.5).  Section  3 contains 
some  basic  notation  and  terminology  as  well  as  some  fundamental  results  on  difference  equations  that 
are  used  in  the  remainder  of  the  paper.  In  Section  4 the  convergence  of  the  linear  update  is  established 
for  the  case  when  the  sequence  {sj^}  repeatedly  spans  3?  in  a uniform  manner  and  yj^  = Sj^  for  each 
k.  This  latter  condition  occurs  in  the  unconstrained  optimization  of  a convex  quadratic  function.  In 
Section  5 this  result  is  generalized  to  allow  the  Sj^  to  approach  a proper  subspace  of  3?  . In  particular, 
the  counterexample  in  [4]  to  general  convergence  of  the  sequence  { } is  clarified.  In  Section  6 the 

convergence  results  are  extended  to  nonlinear  updating  schemes,  i.e.,  choices  of  other  than  zero. 
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Finally,  in  Section  7 the  condition  yj^  = is  relaxed,  permitting  the  previously  obtained  results  to  be 

applied  to  the  updates  of  the  form  (1.5).  The  basic  requirement  is  only  that  the  difference 
converge  to  zero  at  a specified  rate.  Section  8 contains  a summary  of  the  major  conclusions  in  the 
paper,  observations  on  their  relevance,  and  suggestions  for  further  avenues  of  research. 


2.  Problem  formulation 


In  the  quasi-Newton  algorithm  for  unconstrained  optimization  problems  ==  is 

given  by  (1.4).  Assuming  that  the  sequence  {xj^  } converges  to  the  solution  x*,  we  can  write 

= F(x*)sj^  -h  (F(xj^)-F(x*))sj^-f- 0(|sj^|2). 

where  F(x)  is  the  Hessian  of  f at  x.  Thus 

yfc  = F(x«)s^  + 

where  —.0.  If  F = F(x*)  is  positive  definite  then,  since  the  form  of  the  update  (1.5)  is  invariant 


under  the  transformations, 


X.-1/2  „l/2  , „ „l/2„„l/2 

(T)  s<-^F  s,y^F'y,  and  H <->•  F ' H F ' , 


F(x*)  can,  without  loss  of  generality,  be  assumed  to  be  the  identity  matrix.  Using  this  case  as 
motivation,  the  underlying  assumption  for  our  analysis  of  (1.5)  will  be  that 


(2.1) 


= \ + 


where  . — ^ . 0 at  a rate  to  be  specified.  Note  that  if  the  objective  function  is  quadratic  then 

l\l 


= s^.  Then  we  will  allow  the  more  general 
formula  (2.1)  as  a perturbation  of  this  simpler  situation.  Further  comments  on  this  assumption  are 
made  in  the  course  of  the  paper. 


= 0.  Initially,  we  will  analyze  this  special  case  with  yj^ 
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With  the  assumption  that  ~ update  scheme  generated  by  (1.5)  no  longer  depends  on  the 

length  of  Sj^  and  can  be  written: 

(2.2)  H = £/(0,H,s,s) 

= (I  - ss^)H(I  - ss^)  + ss^  - (^(s^Hs)!  s - if  s - -7^  V 

\ s’^Hs/V  s^Hs/ 

where  the  subscript  k has  been  deleted,  H has  replaced  and  |s|  = 1.  It  is  observed  that  both 

the  Broyden  class  of  update  formulas  for  the  approximations  to  the  inverse  Hessian  and  the  Hessian 
satisfy  this  equation  when  y = s.  In  fact,  ( 2.2)  represents  the  most  general  rank-two  symmetric 
update  formula  that  uses  the  vectors  s and  Hs  and  satisfies  the  secant  equation  (1.2)  (which  is  now  Hs 
= s).  Thus  (2.2)  represents  a rather  general  starting  point  for  analyzing  the  convergence  properties  of 
rank-two  updates.  It  should  be  noted  that  other  rank-two  update  formulae,  such  as  the  well-known 
PSB  update,  that  are  not  members  of  the  Broyden  class  may  not  be  able  to  be  placed  in  this  form 
because  they  are  not  invariant  under  the  transformations  ( T ) ; that  is,  it  cannot  be  assumed  that  the 
Hessian  of  the  objective  function  is  the  identity  matrix.  As  is  noted  in  Section  7,  however,  in  certain 
cases  the  convergence  of  the  updates  may  still  be  analyzed  by  the  methods  we  develop. 

The  two  special  cases  identified  in  Section  1 are  singled  out  for  emphasis:  0 = 0,  which  yields 

(L)  H = ( I - ss^)  H ( I - ss^)  + ss^ 

and  0 = 1,  which  gives 

(C)  H = H -h  ss^  - 

s^Es 

These  two  updating  formulas,  which  we  will  hereafter  refer  to  as  the  “L  update”  ( L for  linear)  and  the 
“C  update”  (for  complementary),  can  be  thought  of  as  versions  of  the  well  known  DFP  and  BFGS 
updating  formulas.  The  reader  should  be  careful  to  note  that  the  BFGS,  DFP,  and  other  updates  are 
not  uniquely  identified  in  the  formula  ( 2.2)  because  the  assumption  that  y = s removes  the  reference 
to  the  particular  optimization  setting  that  might  generate  s.  For  example,  the  L update  is  derived 
from  the  DFP  update  for  the  approximation  of  the  Hessian  matrix  or,  alternatively,  the  BFGS  update 
for  approximating  the  inverse  Hessian  matrix.  Similarly,  the  C update  derives  from  the  direct  BFGS 
or  inverse  DFP  update. 
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VVe  will  first  be  concerned  with  the  convergence  of  the  sequence  { generated  by  the  iterative 
scheme  (2.2)  for  arbitrary  positive  definite  starting  matrices  Hq  (although  many  of  our  results  will 
apply  for  arbitrary  symmetric  matrices)  and  specified  sequences  of  unit  vectors  Two  types  of 

methods  of  choosing  the  parameter  <f>^  can  be  considered:  a “static”  procedure  where  4>^  is  constant 
over  all  iterations  and  a “dynamic”  method  where  the  choice  of  (f)^  will  vary  from  iteration  to  iteration 
and  may  depend  on  the  current  values  of  and  . (If  the  dynamic  method  is  derived  from  the 
original  update  formula  (1.5)  then  it  must  be  invariant  under  the  transformations  given  above  if  it  is 

to  be  considered  here. ) The  static  method  is  typified  by  the  L ( 0^^  = 0 for  all  k ) and  the  C ( = 1 

for  all  k)  updates  and  the  dynamic  method  by  the  “symmetric  rank  one”  update  discussed  below.  In 
theory,  the  dynamic  approach  can  be  used  to  take  better  advantage  of  the  current  information  and  can 
therefore  lead  to  more  rapid  convergence  of  the  sequence  This  is  illustrated  by  the 

aforementioned  symmetric  rank  one  scheme  in  which  the  choice  of  0g  leads  to  convergence  of  the 

{ } in  a finite  number  ( < N ) of  steps  whenever  the  vectors  Sq  ,...,Sjy^  ^ are  linearly  independent. 

It  is  useful  to  consider  the  inverse  corresponding  to  the  update  (2.2).  Assuming  that  H and  H are 
invertible,  an  application  of  the  Sherman-Morrison  formula  yields 

(2.3)  H-1  = 

where 

0 -f-  (1-0)(s%s)(s^H‘M 


It  is  observed  that  /i(0)  = 1 and  /i(l)  = 0,  illustrating  the  property  that  the  inverse  of  the  L update 
is  the  C update  of  the  inverse  and  vice  versa.  Moreover,  if  H is  positive  definite  then  (s^Hs)  (s^H'^s) 
> 1,  and  it  is  seen  that  O<0<1  implies  that  ^(0)  falls  between  0 and  1.  Thus  the  inverse  of  a 
restricted  Broyden  update  of  a positive  definite  matrix  can  be  obtained  by  a (dynamic)  restricted 
Broyden  update  of  the  inverse  of  the  matrix. 

It  should  also  be  observed  that  if  H is  positive  definite  then  so  is  H~^  and  thus,  since  the  L and  C 
updates  preserve  positive  definiteness,  (2.2)  and  (2.3)  imply,  by  the  interleaving  eigenvalues  theorem 
(see  [8]),  that  H and  H’^  are  positive  definite  if  either  0 or  /i(0)  is  less  than  or  equal  to  zero. 
More  specifically  we  have 
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ProDosition  2.1  (Fletcher  [2]):  Let  H be  positive  definite.  Then  the  iterate,  H,  given  by  (2.2)  is 
positive  definite  if  and  only  if  0 < (I)q  where 


(s%s)(s^H'^s) 

1-  (s^Hs)(s^H'^s)  ' 


Proof:  Since  ( s^H  s ) ( s^H'^  s)  > 1 clearly  (j>Q  > 1.  If  the  inequality  holds  then  either  0 < 1 or  ii{(p) 
< 0 and  by  the  remarks  above  H is  positive  definite.  If  0 = 0q  then  from  (2.3)  it  is  seen  that  H is 
singular.  From  (2.2)  it  follows  that  if  H is  positive  definite  for  0 = 0^^  and  for  0 = 02  > 0^  then 
it  is  positive  definite  in  ( 0^  , 02  )•  Thus  H cannot  be  positive  definite  for  any  0 > 0q.  • 


The  case  where  0 = 0g 


s^Hs 

1 - s^Hs 


is  of  special  interest  because  the  rank-two  update  formula 


( 2.2)  reduces  to  the  rank  one  update 


(SRI) 


H = H -h 


(s  - Hs)  (s  - Hs)^ 
s\s  - Hs) 


known  as  the  “symmetric  rank  one”  (SRI)  update.  Note  that  the  update  formula  for  H'^  is  identical 
to  that  of  H for  the  SRI  update  (with  H'^  replacing  H).  If  H is  positive  definite  the  SRI  update  is 
never  a restricted  Broyden  update  because  0g  is  either  greater  than  one  (if  s^Hs  > 1 ) or  less  than  zero 
(if  s^Hs  < 1).  Moreover,  H may  not  be  positive  definite  when  H is  positive  definite  if  0g  > 1;  in 
fact,  the  SRI  update  will  not  exist  if  ( I — H)s  is  orthogonal  to  s.  It  follows  from  Proposition  2.1  and 
the  definition  of  0g  that  the  SRI  update  is  positive  definite  if  and  only  if  either  s^  H s < 1 ( 0g  < 0 ) 
or  s^H'^s  < 1 (l<0g<0Q). 


The  convergence  of  the  static  schemes  corresponding  to  the  L and  C updates  are  interrelated  as  shown 
in  the  following  proposition. 


Proposition  2.2:  Let  P be  any  subset  of  the  symmetric  matrices  for  which  H £ P implies  e P.  If 
for  a given  sequence  { } the  sequence  { } generated  by  the  L update  converges  for  every  initial 
matrix  chosen  from  P,  then  the  same  is  true  for  the  C update,  and  conversely. 
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Proof:  Suppose  that  the  sequence  generated  by  the  L updating  formula  converges  to  L(Hq)  for  a 
starting  matrix  Hq  c P.  Then  the  sequence  converges  to  L(Hq)‘^.  But  the  sequence 

is  generated  by  the  C updating  formula  starting  from  Hq'^  and  so  the  C formula  yields 
convergent  sequences  starting  in  P.  • 


It  is  well-known  that  for  any  choice  of  linearly  independent  k = 0,  and  any  initial  Hq 

the  generated  by  the  SRI  update  satisfy  the  “hereditary”  secant  condition 

(2.4)  Sj  = Sj  , j = 0,...,k. 


It  is  easily  shown  by  induction  that  this  condition  is  also  satisfied  by  all  updates  of  the  form  ( 2.2)  if 
the  {sj^}  are  orthogonal  (corresponding  to  conjugate  directions  in  the  quadratic  optimization  setting). 
The  condition  ( 2.4)  implies  that  the  converge  in  at  most  N steps  and  if  the  full  N steps  are 
required  then  the  limiting  matrix  is  the  identity.  Thus  the  iteration  matrices  converge  finitely  for  any 
sequence  { } when  the  {sj^}  are  orthogonal  and  for  the  SRI  updates  when  the  {sj^}  are  linearly 
independent.  In  both  cases  the  limiting  matrix  is  the  identity  matrix  and  so  it  is  natural  to  expect 
that,  generally,  if  the  matrices  converge  nonfmitely  they  will  also  converge  to  the  identity  matrix 
(other  possibilities  are  considered  in  Section  5).  Accordingly,  it  is  somewhat  simpler  to  consider  the 
convergence  of  the  sequence  of  matrices  { } where  Ej^  = — I.  Using  the  relations  = 1 

and  = Ej^s^  -I-  in  (2.2),  the  can  be  seen  to  satisfy  the  difference  equation 

®k+l  = 


(2.5) 


-0, 


1 + Si 


% \ 


It  is  this  form  of  the  iteration  scheme  that  we  shall  analyze  first.  Note  that  the  quasi-Newton 
condition  for  the  becomes  = 0 so  that  the  matrices  Ej^ , k > 1,  are  always  singular  and 

hence  never  positive  definite;  however,  they  are  symmetric  if  Eq  is  symmetric.  The  eigenvectors  of 
and  E^^  are  identical  and  the  eigenvalues  of  are  exactly  one  unit  greater  than  the  eigenvalues 
of  Ej^.  Thus  if  the  are  positive  definite,  the  eigenvalues  of  the  Ej^  are  greater  than  negative  one 
and  conversely.  As  a result,  ( 2.5)  is  well-defined  provided  positive  definiteness  of  the  { } is 
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maintained,  as  in  the  L and  C updates. 

^ resulting 


However,  if  the  matrix 
undefined. 


Hk 


is  not  positive  definite  then 


It  is  a well-known  fact  that  both  the  L and  C updating  formulas  ( and  therefore  those  of  the  restricted 
Broyden  class)  lead  to  sequences  { Ej^  } whose  Frobenius  norms  are  nonincreasing  and,  in  fact,  are 
strictly  decreasing  if  Ej^Sj^  ^ 0.  (See,  for  instance,  [4]  or  [5]  ).  However,  as  illustrated  in  [4],  this 
does  not  ensure  convergence  of  the  matrices  { Ej^  } nor  does  it  yield  any  limiting  matrix  when  the 
sequence  does  converge.  It  should  be  noted  that  not  all  updates  of  the  form  ( 2.5)  lead  to  sequences 
{Ej^}  that  decrease  in  the  Frobenius  norm.  For  example,  the  Frobenius  norms  of  the  matrices  Ej^ 
generated  by  the  SRI  update  can  have  large  jumps  in  value  — although  convergence  within  N steps  is 
assured.  The  analysis  of  the  next  sections  is  intended  to  shed  light  on  the  convergence  of  these 
matrices  and  their  limits. 


3.  Notation  and  preliminary  results 


Throughout  this  paper  the  notation  [x]  will  refer  to  the  Euclidean  norm  of  the  vector  x.  For  a matrix 
A,  II A II  will  denote  the  operator  norm  of  A,  i.e., 

I I A I i f I ^ I 1 

II  All  = sup  { }, 

while  II A lip  will  denote  the  Frobenius  norm, 

II A lip  = 

In  referring  to  rates  of  convergence  the  term  “m-step  linear  convergence”  will  mean  Q-order 
convergence;  that  is,  the  sequence  converges  to  zero  with  an  m-step  linear  rate  if  there  is  a /?, 

0 < /?  < 1,  independent  of  k such  that 

l^k+m  - ^*1  ^ ^l*k  -=‘*1 


for  all  k sufficiently  large.  One -step  linear  convergence  will  be  called  simply  “linear  convergence”. 
We  will  also  refer  to  R- (order)  linear  convergence,  which  requires 
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(|X|^  - < 0 < \ 

for  all  k sufficiently  large.  The  important  implications  for  our  work  are  that  m-step  linear 
convergence  implies  R- linear  convergence  and  that  the  components  of  a vector  converge  R- linearly  if 
and  only  if  the  vector  itself  converges  R- linearly  (which  does  not  hold  for  m-step  linear  convergence). 
It  will  also  be  useful  to  note  that  a sequence  converges  R- linearly  if  and  only  if  it  is  majorized  by  a 
sequence  converging  (one -step)  linearly.  That  is,  converges  R- linearly  if  and  only  if  there  are 

constants  u and  r,  0 < r < 1,  such  that  for  all  k 

I Xj^  — X*  I < U ' . 

More  complete  information  on  the  definitions  and  properties  of  Q and  R order  convergence  can  be 
found  in  [9]. 

Many  of  the  proofs  in  the  following  sections  will  depend  upon  the  properties  of  solutions  of  difference 
equations.  To  simplify  the  presentations  in  Sections  4-7  we  provide  in  this  section  some  useful 
theorems  on  such  systems. 


Consider  the  system  of  nonlinear  difference  equations 


(3.1) 


^k+1  - \ + ^k(^k’"k’‘k) 

'^k+1  = "k  + 


where  z.  e w.  e e and  Ak  is  a p x p matrix.  Denote,  for  any  fixed  positive 

integer  m , 


(3.2) 


m-1 

^k,m  = II  n (\+i* 

i=0 


The  following  give  convergence  results  for  the  solution  to  (3.1)  under  a variety  of  conditions  on  the 
matrix  and  the  functions  and 


Theorem  3.1:  Suppose  that  { Zj^  } and  { } satisfy  (3.1)  and,  in  addition, 
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^k,l  - ^ II  II  - 

(ii)  for  some  fixed  integer  m,  there  is  a constant  /?  < 1 and  an  infinite  sequence  of 

integers  { k.  } such  that  /?,  < (3  for  each  j ; 

J Kj,m 

(iii)  there  exist  constants  and  K2  independent  of  k such  that 

1 Zj^(  z,  w,  t ) I < • 1 1 1 and  | Wj^(  z,  w,  t )|  < ^2  • I M for  every  k ; and 

(iv)  the  sequence  { } converges  to  zero  R- linearly. 

Then  for  any  Wq  there  exists  a vector  w*such  that  { } — *•  w*  R- linearly  and  for  any  Zq  the 
sequence  {zj^  } converges  to  the  zero  vector.  If  condition  (ii)  is  replaced  by 

(ii)^  for  some  fixed  integer  m,  there  is  a constant  (3  < \ such  that  /?,  < (3  for  every 

K ^ m 

k sufficiently  large; 

then  the  convergence  rate  of  {zj^}  is  R- linear  and  if  Zj^  = 0 for  each  k the  rate  is  m-step  linear. 


Theorem  3.2:  Suppose  that  conditions  (i),  (ii)^,  and  (iv)  of  Theorem  3.1  hold  and,  in  addition: 

(iii)^  there  are  positive  constants  /Cp  /c^,  and  b such  that  for  each  k 

I Z|j(  Z,  w,  t ) I < 1 z 1^  -t-  K2  ( I z I + 1 w I ) • 1 1 1 

and 

|Wj^(z,w,t)|  < /C3  |z|^  + ( |z|  + I w|)  • |t| 

whenever  | z | < 6 . 

Then  there  exist  positive  constants  and  Tq  and  a.  w*  e 3?^  such  that  if  | Zq  | < rj^  and  I I < 
for  every  k,  then  the  solutions  to  (3.1)  have  the  properties  that  { Zj^  } — *•  0 and  { } — *•  w*  and  the 

convergence  rates  are  R- linear.  If,  in  particular,  = 0 then  the  convergence  rate  of  { Zj^  } is  m- 
step  linear. 
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Sketches  of  the  proofs  of  these  two  theorems  can  be  found  in  the  Appendix. 


4.  Convergence  of  the  L updating  method 


When  = 0 (which  corresponds  to  the  L update)  the  iterations  (2.5)  reduce  to 

(4.1) 

Note  that  for  fixed  (4.1)  defines  a linear  transformation  from  the  space  of  NxN  matrices  into  itself. 
The  other  updates  of  the  Broyden  class  ( ^ 0 ) can  in  fact  be  considered  to  be  nonlinear 
perturbations  of  this  linear  mapping.  We  will  exploit  this  fact  in  Section  6 to  derive  convergence 
results  for  the  cases  where  ^ 0.  Denoting  the  mapping  defined  by  (4.1)  as 

G(s^  ; •):  L(S'^,  H-  L(3}^, 
we  have  that 

(4.2)  = G{s^;E^). 

Since  the  mapping  preserves  symmetry,  Sm » the  set  of  symmetric  NxN  matrices,  is  a ^N(N-l-l)- 
dimensional  invariant  subspace  for  * )•  If  assign  the  Frobenius  norm  to  L(3?^,3?^)  so  that 

it  is  an  inner  product  space  then,  as  the  following  proposition  shows,  the  mapping  G(  s;  • ) is  a 
projection. 


Proposition  4.1:  Let  s be  a given  unit  vector.  The  mapping  (j  ( s;  • ) is  a projection  onto  an 
(N-l)^- dimensional  subspace  of  L(3?^,3f?^).  An  orthonormal  set  of  eigenvectors  corresponding  to 
the  eigenvalue  zero  is 


t / 2\t  / 3\t  / N\t  2 t 

ss  , s(v^)  , s(v  ),...,  s(v^  ),  V s , 


vNs‘ 


and  an  orthonormal  set  of  eigenvectors  corresponding  to  the  eigenvalue  one  is 
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v2(v2)t,  v2(v3)t,.,.,  v2(vN)t,  v3(v2)t v3(vN)t,...,  


.N  : 


N 


where  s,  v“,  ...  , is  an  orthonormal  basis  for  . 


Proof:  Since  = G and  G o G = G,  G is  a projection.  The  orthonormality  follows  from  the  facts 
that  for  N-vectors  u,  v,  w,  and  y ; 

I!  uv^llp  - |u||v| 

and 

(uv^)^(wy^)  = (u^w)(v^y) 

where  the  superscript  “T”  represents  the  inner  product  in  The  eigenvalue  properties  are 

easily  checked.  • 


Since  Sm  is  an  invariant  subspace  of  G(s;  • ),  it  is  of  interest  to  provide  an  orthonormal  basis  for  Sm 
consisting  of  eigenvectors  of  G(s;  • ).  The  following  is  a direct  consequence  of  Proposition  4.1. 


Proposition  4.2:  The  following  symmetric  matrices  are  orthonormal  eigenvectors  of  G(s;  ■) 

corresponding  to  the  eigenvalue  zero: 


ss 


/ 2xt  , 2 t 

s{v^y  -h  v^s" 

v/2 


s(v^)^  -f  ^^s^ 

__  „ 


and  the  following  are  orthonormal  eigenvectors  corresponding  to  the  eigenvalue  one: 


vk(vk)t, 


yk  )t  _J.  yk  (yj 


V2 


for  k = 2, ... , N and  j = k-fl, ... , N. 


Since  G(s;-)  has  eigenvalues  equal  to  one  it  is  not  a contraction  mapping,  even  when  restricted  to 
Smi  and  so  linear  convergence  of  the  is  ruled  out  in  general.  Since  G(s;  • ) is  a projection  matrix 
it  is  possible,  however,  that  a sequence  of  applications  will  reduce  the  norm  of  the  E|^ , i.e.,  that  some 
multi-step  linear  convergence  can  occur.  In  particular,  we  consider  the  sequence  of  N applications  of 
G, 
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%(‘)  = • ) ° • ) o •••  oG(sq,-). 

\r 

We  will  show  that  is  a contraction  mapping  provided  SQ,Sp...,Sj^  ^ are  linearly  independent.  To 
do  that  we  must  show  that  the  operator  norm  of  Gjyj  , 

||Gp,||  = max{||GN(W)||p:  W 5 L(sN  ^ j j 


is  less  than  one. 

By  Proposition  4.1  it  is  seen  that  there  is  an  orthonormal  basis  for  consisting  of  rank  one 

matrices.  If  these  basis  matrices  are  denoted  by  W ,W  ,...,  W with  K = N , then  for  any  W £ 
with  ||W||p  = 1 

W = 7-  and  E ( T;  = 1 • 

j=l  ^ j=l  ^ 

To  obtain  the  desired  results  we  define  the  set  of  vectors  that  generate  the  basis  matrices  in  a 
special  manner.  If  we  assume  that  the  set  of  vectors  {sQ,Sp...  is  linearly  independent  and  for 

each  j , j = 0 , ... , N-1,  let  Sj  = span  { Sq,  •••  iSj  },  then  we  may  choose  the  vectors  v^  as  follows: 

(i)  v|^  = Sq; 

(ii)  v^  £•  S-  and  v^  £ (S* 

(iii)  |vj|  = 1. 


This  set  of  vectors  exists  (but  is  not  unique)  because  of  the  linear  independence  of  the 

N 

an  orthonormal  basis  for  3?  . With  this  definition  we  have 


s. 


J 


and  it  forms 


(4.3) 

s.  = 
J 

Defining 

(4.4) 

Uq  = 

and,  for  i = 0 , 

..,N-1, 

(4.5) 

k=0  ^ 


N-1 

= E Tj-i  k 
k=l  ^ ’ 


where 


k=0  ^ 


1. 


where 


N-1 

E 


k=0 


1 


= (•  - ('  - “0 
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we  have 


Proposition  4.3:  The  7.  , in  (4.5)  satisfy  the  recurrences 

1,  K 

^i+i,k  = 


(4.6) 

and 


7k  - 


7 k 


k > i 


(4.7) 


N-1  9 N-1  9 ,9 

k=0  ^ ’ k=0  ‘ ‘ 


Proof:  From  (4.3)  and  (4.5)  we  have 


1+1 


= (I  — s.  s-  M U-  = u.  — ( s-  ^ u.  ) s.  = u.  — ( s-  ^ U-  ) V ( s.  ^ v^  ) v^ 

1 1 ' 1 1 ''  I I M 1 1 1 7 1 ' 

k=0 


and  (4.6)  follows  from  equating  coefficients  of  v^.  Then 


N-1  9 N-1  o i 

k=0  ^ ’ k=i+l  k=0 

N-1  9 .9 

= E (Tj  “ (sj^uj)- 

k=0  ' ' 

where  we  have  used  the  fact  (from  (4.3))  that  ^ (s-^v^)^ 

k=0 


= 1. 


It  now  follows  from  (4.7)  that 


= (1  - cos2(^i)) 


where  9-^  is  the  angle  between  s-  and  Uj . Therefore, 

N-1 


(4.8) 


= n ( 1 - “s^(^i))  iuqI^- 

i=0 


Setting 


R — n (I  — s-s.^)  — (I  - s^_^s^_^^)  •••  (I  - SqSq^), 
j=0  ^ 


(4.9) 
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it  is  seen  |Rug|  = | | < | Ug  1 unless  9-^  = 7r/2  for  all  i.  However  depends  on  u^  and  hence 

Ug  so  that,  in  principle,  | | could  be  arbitrarily  close  to  | Ug  |.  We  would  like  to  show  that  the  factor 

in  (4.8)  actually  is  bounded  away  from  one  for  all  Ug.  This  is  the  content  of  the  next  proposition. 


Proposition  4.4:  Given  linearly  independent  Sg,...,Sj^_j  there  is  a constant  o,  0 < a < 1,  such  that 
for  all  Ug  £ 3^^ 

I |2  ^ 2 I |2 

< oc  |uq|  . 


Proof:  We  prove  the  result  holds  for  all  Ug  with  | Ug  | = 1 ; the  general  ca^e  then  follows  in  a 

straightforward  manner.  The  proof  is  by  contradiction.  If  the  proposition  is  not  true  then  there  exists 
a sequence  of  vectors  with  | Ug  | = 1 and 

> (1  - ?)  I Ugl^. 

Let  Ug  be  any  limit  point  of  the  sequence  { Ug } and  Uj,^  = R Ug.  Then  | p = |ug|“  = 1. 

Clearly,  by  (4.8)  this  implies  that  = 7r/2  (or  (sj^Uj)  = 0)  for  all  i.  Therefore,  by  (4.6),  it 
follows  that  for  all  i 


N-1 

= E 

k=0 


^0,k 


and  hence  s-^  Ug  = 0.  The  linear  independence  of  the  Sj  implies  that  Ug  = 0 contradicting  the  fact 
that  I Ug  I = 1 . • 


Using  the  formula  l|V||p^  = trace(vW)  and  the  fact  that  (j||^(u  v^)  = Ruv^R^  where  R is  given 
by  (4.9),  it  is  seen  that  Proposition  4.4  yields 

ll%(uv^)||p  = |Ru||Rv|  < |u||v|  = lluv^llp. 

In  order  to  show  that  is  a contraction  mapping  it  is  necessary  to  extend  this  inequality  to  all 
matrices  with  unit  Frobenius  norm. 
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It  now  follows  from  Proposition  4.1  that  each  basis  matrix  VV"^  for  can  be  written  in  the 

form  v^(v^)^  for  some  i and  k,  0 < i ; 
unit  Frobenius  norm)  can  be  expressed  as 


form  v^(  for  some  i and  k,  0 < i < N-1,  0 < k < N-1.  Thus  an  arbitrary  matrix  W ( with 


N-1  N-1  j . , 

W = E E C;  k ''  (v*'/ 

k=0  i=0  ’ 


where 


N-1  N-1 


E E (<ik)  = 1- 

k=0  i=o  ’ 


For  any  such  matrix  W,  Gn(  W)  = RWR^  where  R is  as  given  in  (4.9).  Thus 


where 


N-1  N-1  ; w f t 

G^{W)=  RE  E Ci  k 
i=0  k=0  ’ 


N-1  . N-1 

= R E [R  E Pi  1 

i=0  k=0  ’ 


k it 


Ci  k 9 N-1  n 


Thus 


N-1  9 

E iPi  k)  = 1 

k=0  ’ 


N-1  9 

and  E (/^i)^  = 1 

i=0 


Then  by  Proposition  4.4 


N-1 

R E Pj  k 

k=0  ’ 


V = 


N-1 

^ k 
k=0  ’ 


where 


N-1  9 9 N-1  9 

E (Pi  k)  < E (Pi  k) 
k=0  ’ k=0  ’ 


k \t 


N-1  j N-1 

Gn(w)  = r i;  /?i  v‘  E Pi  k (''  ) 

‘ i=0  k=0  ’ 


k \t 


N-1  N-1  . ^ 

= [R  E /?i  ( E Pi  k)  ) 

i=0  k=0  ’ 

N-1  N-1  i k , 

= E [R  E Pi  k^  H''  > 

k=o  i=o  ’ 


for  each  i . Now 
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where  /ij  k ’ Proposition  4.4  again  gives 

N-1  N-1  ... 

^ E Aj  k V 
k=0  i=0  ’ 

where  for  each  k 


Therefore, 


N-1  9 9 N-1 

E (Pi  k)  < E (Pi  k)' 

i=0  ’ i=0 


9 N-1  N-1  9 

r = E E (Ai  kT 

k=0  i=0  ’ 

< E E (Pi  k) 

k=0  i=0  ’ 

4 N-1  N-1  9 

<<*^E  E(Cik)- 

k=0  i=0  ’ 


N-1 

N-1 

o 

E 

E 

(“ik) 

k=0 

i=0 

A 

N-1 

N-1 

< 

E 

E (p 

k=0 

i=0 

Thus  we  are  lead  to  the  following  theorem. 


Theorem  4.5:  If  Sq’  ’’^N-1  linearly  independent,  then  there  exists  an  a,  0 < or  < 1 , and 

9 

depending  only  on  ( Sq  , • • • » ) such  that  ||(jjyj|i  < a"  < 1.  Thus  the  linear  transformation  is 
a contraction  mapping  on  the  set  of  N x N matrices. 


Before  stating  our  result  on  the  convergence  of  the  matrices  generated  by  (4.1)  we  provide  the  following 
definitions  ( See  also  [ 7 ] ) . 


Definition  4.1:  Let  {sj^  } be  a sequence  of  unit  N- vectors.  For  each  k , let 
S,k  = ^(\+N-l’  ■*°‘"(*k+N-2’  ■)°"'°‘^(®k  ’ ■) 

«(k)  = II  S,k  II- 


and  define 
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Defmition  4.2:  Let  (sj^  } be  a sequence  of  unit  3?^-vectors.  If  there  is  an  infinite  subsequence  of  the 

integers,  { k-  },  and  a constant  a,  0 < a < 1,  such  that  Q:(  k- ) < a for  each  j then  the  sequence  {si  } 
J J K 

is  said  to  be  subsequentiallv  linearly  independent.  In  this  case,  if  the  subsequence  { k j } can  be  chosen 
so  that 

X = lim  sup  { k.  ^ - k.  - 1} 
j — oo 

is  finite,  then  the  sequence  { Sj^  } is  said  to  be  uniformly  linearly  independent  with  gap  x- 


The  a(k)  of  the  definition  are  well-defined  by  Theorem  4.5.  Note  that  if  there  is  an  a < 1 such  that' 
q(  k)  < a for  all  k sufficiently  large  then  the  gap  is  zero,  in  which  case  we  call  the  sequence  {sj^} 
uniformly  linearly  independent. 

To  obtain  the  basic  conyergence  theorem,  we  put  the  system  (4.2)  into  the  form  of  the  system  (3.1)  by 

2 . . •22' 
identifying  each  with  an  N -dimensional  yector  and  each  G(sj^;')  with  an  N xN  matrix 

^k'  ^k  both  zero  for  all  k. ) Since  we  are  employing  the  Frobenius  norm  on  the  , 

we  haye  | Zj^  | = t|Ej^||p  and  also  ||Aj^||  = ||  G(sj^;  •)||.  The  following  proposition  will  help 

establish  the  rate  of  conyergence  of  the  { }. 


2 2 

Proposition  4.6:  Let  the  sequence  {sj^}  be  giyen  and  for  each  k let  be  the  N xN  matrix 
representing  G(sj^;*).  Let  {(3^  be  the  constants  defined  by  (3.2).  Suppose  {sj^}  is 
subsequentially  linearly  independent  and  let  the  constant  a < 1 and  the  subsequence  { kj } be  as 
specified  in  Definition  4.2.  Then 

(4.10) 

for  all  k , and 

(4.11) 

for  each  j.  If  (sj^  } is  uniformly  linearly  independent  with  gap  x fben 

\(N+x)  - “ 


(4.12) 
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for  all  k sufficiently  large. 


Proof:  (4.10)  and  (4.11)  follow  directly  from  Definition  4.2.  Assuming  {sj^}  is  uniformly  linearly 
independent  with  gap  x i { kj } be  the  sequence  given  by  Definition  4.2.  Then  for  every  k 
sufficiently  large  there  is  a kj  such  that  kj  — x < k < k j . Therefore 


N-kx-1 

II  n 


i=0 


^k+i 


< ( 


-1 

n 

i=k-k. 

J 


‘kj+i 


N-1 

(11  Au.+  i) 

i=0  J 


k-kj-fx-1 

( n !iAk.+N+iii) 

i=0  J 


< a < 1 


which  yields  (4.12) . 


The  main  result  of  this  section  is  now  just  a straightforward  application  of  Theorem  3.1  (with  and 
both  zero)  and  Proposition  4.6. 


Theorem  4.7:  Suppose  the  sequence  { } is  subsequentially  linearly  independent.  Then  the  sequence 
{ Ej^  } defined  by  (4.1)  converges  to  the  zero  matrix  from  any  symmetric  starting  matrix  Eq.  If  the 
sequence  {sj^  } is  uniformly  linearly  independent  with  gap  x then  the  convergence  rate  is  (N-l-x)-step 
linear. 


5.  Extensions  of  convergence  results  for  the  L- update 


In  the  previous  section  we  proved  that  under  relatively  weak  conditions  on  the  sequence  { } the  L 
update  leads  to  the  convergence  of  the  sequence  { Ej^  } regardless  of  the  starting  point.  In  this  section 
we  extend  that  result  by  relaxing  the  conditions  on  the  sequence  }• 


As  was  pointed  out  in  Section  1,  sequences  {sj^}  can  be  constructed  for  which  the  corresponding 
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{ Ej^  } will  not  converge.  This  was  demonstrated  by  Ge  and  Powell  [4]  for  the  two-dimensional  case. 

In  their  example,  the  vectors  { } are  pairwise  linearly  independent  but  become  more  nearly  identical 

w'hile  not  converging  to  a single  vector.  Thus  the  vectors  are  not  subsequentially  linearly  independent 

0 

and  they  do  not  converge  to  a proper  subspace  of  This  latter  property  is  also  necessary  for 

nonconvergence  of  the  { } in  a sense  that  will  be  made  clear  in  the  following. 


Theorem  4.7  is  predicated  on  the  (subsequential)  uniform  linear  independence  of  the  sequence  { Sj,  } 
which  requires  that  they  span  We  now  relax  this  restriction  and  assume  instead  that  they  span  a 

proper  M-dimensional  subspace,  S,  of  3?^.  In  order  to  analyze  this  case  we  let  P be  an  NxM  matrix 
whose  columns  are  an  orthonormal  basis  for  S and  let  Q be  an  Nx(N  — M)  matrix  whose  columns  are 
an  orthonormal  basis  for  S“*".  For  such  P and  Q any  NxN  symmetric  matrix  E can  be  written 


(5.1)  E = PP^EPP*^  -b  QQ*^EQQ^  + PP^EQQ^  + QQ^EPP^ 


Note  that  P P^  is  the  projection  of  3?^  onto  S and  QQ^  = I — P P^  is  the  projection  onto  S'^. 


For  each  k we  let  r^  = \ where  rj^  s 3?^^  and  | | = 1.  For  a 

sequence  { Ej^  } satisfying  (4.1),  we  set 


(5.2a)  V,^=P^E,^P, 

(5.2b)  U,j  = Q‘E^Q, 

(5.2c)  = p‘e^Q, 

SO  that 

(5.3)  E,j  = PV^P^  + QU^Q‘  + PY^Q‘  + QY^‘P‘. 

Note  that  Yj^  = 0 if  and  only  if  S is  an  eigenspace  of  Ej^.  Since  p‘p  = Im  ( the  MxM  identity 
matrix)  and  Q^Sj^  = 0,  we  see  from  (4.1)  that  these  matrices  satisfy  the  following  uncoupled 
difference  equations 


(5.4a) 


^k-fl  - ~ ^k^k^)  ^k  - ^k^k^ 


Uk+1  = 


(5.4b) 
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( 5.4c ) 


Y 


k+1 


- (I 


M 


— Ft 


Y)y, 


Equation  (5.4a)  shows  that  the  MxM  matrices,  satisfy  the  updating  scheme  (4.1);  (5.4b)  that  the 
(N  — M)x(N  — M)  matrices,  Uj^,  are  constant  with  respect  to  k;  and  (5.4c)  that  the  Mx(N  — M) 
matrices,  Yj^,  satisfy  a one-sided  type  of  (4.1)  update.  Thus  if  the  sequence  { rj^  } is  uniformly  linearly 
independent  with  gap  x if  follows  from  Theorem  4.7  that  converges  to  the  MxM  zero  matrix 

and  that  the  rate  of  convergence  is  (M-|-x)‘Step  linear.  A modification  of  the  proofs  of  the 
propositions  and  theorems  in  Section  4 can  be  used  to  show  that  the  Yj^  also  converge  to  a zero  matrix 
at  the  ( M 4- X )- linear  rate.  It  is  perhaps  easier,  however,  to  note  that  the  symmetric  positive  semi- 
defmite  matrices  2j^=Yj^Yj^^  satisfy  the  same  equation  as  the  and  hence  tend  to  zero;  the 
convergence  of  the  Yj^  to  the  zero  matrix  R- linearly  then  follows  from  the  convergence  of  the  If 
the  sequence  { rj^  } is  only  subsequentially  linearly  independent  then  the  convergence  of  the  and 
Yj^  still  holds  but  there  is  no  multi-step  linear  rate.  As  a result  of  these  observations  we  have  the 
following  theorem  which  shows  that  the  restriction  of  the  { } to  a subspace  does  not  prevent  the 
convergence  of  the  { Ej^  }• 


Theorem  5.1:  Let  S be  an  M - dimensional  subspace  of  3^^  and  let  the  matrices  P and  Q be  defined 
cis  above.  Suppose  the  sequence  of  N-vectors  {sj^}  is  contained  in  S and  that  the  sequence  of  M- 
vectors,  defined  by  r^^  = P^Sj^  is  subsequentially  linearly  independent.  Then  for  any  initial 

symmetric  matrix  Eq,  the  sequence  defined  by  (4.1)  converges  to  the  matrix  QQ^EqQQ^.  If  the 
sequence  { rj^ } is  uniformly  linearly  independent  with  gap  x then  the  convergence  rate  is  at  least  R- 
linear. 


Proof:  By  the  remarks  above,  Vj^  ~ ^0  convergence 

result  follows  from  (5.3).  The  R- linear  convergence  follows  from  the  fact  that  the  components  of  Ej^ 
have  (M-|-x)“Step  linear,  and  hence  R- linear  convergence,  which  implies  (by  the  remarks  in  Section 
3)  the  R- linear  convergence  of  the  Ej^.  • 


The  next  result  is  somewhat  surprising  in  that  it  demonstrates  that  if  { } converges  at  a reasonable 

rate  to  a subspace,  convergence  of  the  { Ej^  } can  still  be  obtained.  The  importance  of  this  type  of 
convergence  in  constrained  optimization  will  be  discussed  in  Section  8 . 
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Defmition  5.1:  Let  S be  an  M - dimensional  subspace  of  3^^  and  suppose  that  { } is  a sequence  of 
unit  vectors  that  satisfy  d-  where  Uj.  £ S,  Vj,  £ 8“^,  and  — 0 as  k — cxd.  Then  we 

say  that  { Sj^  } converges  to,  If  | Vj^  | tends  to  zero  R- linearly,  we  say  that  { } converges  to  S at 

an  R~  linear  rate. 


Theorem  5.2:  Suppose  that  the  {sj^}  converge  to  an  M- dimensional  subspace  S at  an  R- linear  rate. 
Let  matrices  P and  Q be  defined  as  above,  let  { } be  the  sequence  of  unit  M - vectors  given  by 


(5.5) 


and  assume  that  { } is  subsequentially  linearly  independent.  Then  for  any  initial  matrix  Eg  there  is 

an  (N-M)x(N-M)  matrix  U*  such  that  the  sequence  { Ej^  } defined  by  (4.1)  converges  to  QU*Q^. 
If  { } is  uniformly  independent  with  gap  x then  the  convergence  rate  is  R- linear. 


Proof:  Let  ~ '^k  ^ Definition  5.1.  Then 


’k  = |-^|  + “k  “k  + 


where  = 


>iki  -1 


u 


k' 

ti  = 


and  from  (5.5) 


P^u, 


^k 


and  hence 


\ ~ Pfk  d-  c^k^k  ■*“  '"k’ 


It  can  be  shown  that  the  conditions  \s^  \ = 1 and  I | — ► 0 R- linearly  imply  that  orj^  — *•  0 R- 

linearly.  Thus  we  can  write 


(5.6)  sj^  = Prj^  + tj^ 

where  0 R-linearly.  Now  using  (5.1)  and  letting  Wy.,  Uj^,  and  be  defined  as  in  (5.2)  we 
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obtain  the  equations 


(5.7a) 


Vk+i  = Vk‘)Vk(l- Vk‘)  + 


(5.7b) 


^k+1  - U|j  + U(  E,^, 


(5.7c) 


Yk+iW=  (•- Vk‘'W('- Vk‘)  + nE^.r^.t^) 


where,  since  the  Ej^  and  are  uniformly  bounded,  the  nonlinear  terms  V,  U,  and  Y are  0(|tj^|). 
Thus  the  system  (5.7)  can  be  put  in  the  framework  of  (3.1)  by  identifying  and  with  Zj^ , 

and  Uj^  with  Wj^ . The  conclusion  then  follows  from  the  application  of  Theorem  3.1.  • 


As  a result  of  this  theorem,  it  is  seen  that  nonconvergence  of  the  { } can  occur  only  if  the  sequence 

{sj^}  does  not  approach  any  fixed  subspace  R- linearly  in  such  a way  that  the  projections  onto  the 
subspace  are  subsequentially  linearly  independent.  As  noted  above,  in  the  Ge  and  Powell  example,  the 
vectors  approach  each  other  but  not  a fixed  subspace. 


6.  Convergence  results  for  other  upwiates 


In  this  section  we  extend  the  convergence  results  obtained  in  Sections  4 and  5 to  the  update  process 
( 2.5)  for  other  than  zero.  This  will  be  accomplished  by  considering  the  general  update  as  a 
perturbation  of  the  L update.  That  is,  from  ( 2.5) 

(6.1)  \+i  = 

where  H:  L(9?^,3i^)  — . L(3i^,S^)  is  a nonlinear  mapping.  If  IIE^Hp  is  sufficiently  small  then 

K*  +®k‘^k\)'^l  = ' + OdlEkllp) 


and  hence 
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(6.2)  II  H(s^;  E|^)||p  < ( ||  ||p  )- 

where  a is  a constant  independent  of  k.  This  fact  leads  directly  to  the  next  theorem. 


Theorem  6.1:  Let  { } be  uniformly  linearly  independent  with  gap  ^ and  let  the  (f)^  satisfy 

(6.3)  I I < K • ( 1 4-  llEj^Ilp) 

for  some  constant  k independent  of  k and  l|Ej.|lp  sufficiently  small.  Then  there  exists  a postive 
constant  S such  that  if  Eq  is  any  symmetric  matrix  satisfying  ||Eq||p  < 6 the  sequence  defined  by 
( 2.5)  converges  to  the  zero  matrix  and  the  rate  of  convergence  is  ( N -f  x )-step  linear. 


Proof:  As  in  the  proof  of  Theorem  4.7  we  identify  Ej^  with  and  G(s^;')  with  Now 

identifying  the  term  ')  in  (3.1)  and  applying  (6.2)  and  (6.3)  assures  the  hypotheses 

of  Theorem  3.2  are  satisfied.  ( No  Wj^  is  present  is  this  application. ) The  result  follows.  • 


The  boundedness  requirement  on  the  , (6.3) , generally  rules  out  dynamic  processes  such  as  the  SRI 
in  which  the  <i>^  can  be  arbitrarily  large  for  small  Ej^  ( although  the  SRI  update  itself  will  converge 
finitely  in  this  case).  There  are  two  significant  differences  between  Theorem  6.1  and  the  corresponding 
theorem  for  the  L update,  Theorem  4.7.  First,  it  appears  that  uniform  linear  independence  is  needed  as 
a hypothesis  for  Theorem  6.T,  otherwise,  the  gap  between  successive  values  of  the  sequence  { k j } in 
Definition  4.2  could  become  large  enough  to  allow  the  quadratic  terms  in  H(sj^;  •)  to  grow  so  rapidly 
as  to  preclude  convergence  of  the  solution  of  (3.1)  to  zero.  More  importantly.  Theorem  6.1  yields  local 
convergence  only;  inequality  (6.2)  is  not  valid  without  the  assumption  that  ||Eq|1  is  small.  However, 
as  will  be  observed  in  Section  7,  for  certain  starting  matrices  the  convergence  of  the  C updating 
process  can  be  obtained  without  these  two  restrictions. 

To  obtain  the  convergence  of  the  general  updating  scheme  (2.5)  when  the  lie  on  a subspace  we 

apply  the  perturbation  technique  of  Theorem  6.1  to  the  analysis  in  Theorem  5.1.  Defining  the  matrices 
P,  Q,  Vj^,  Uj^,  and  Yj^  as  in  Section  5 and  setting  rj^  = P^Sj^  permit  us  to  decompose  (2.5)  into  the 
system 
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( 6.4a) 


--^k(  Vkl[(fk‘Vk^k)fk-Vkl' 


(6.4b) 


Yk+l  = ( I - ^k'k‘)^k-  \(  ,^/tv  . ) [('k^\r,)  r,-  V,r,]  [r,%] 

V ^ + 'k  \'k)  / 


( 6.4c ) 


Uk+l  = Uk-^k{  . , _V,  _ )V\^k%- 


1 + fk‘vk^k 


From  these  equations  we  obtain  the  following  result. 


Theorem  6.2:  Let  S be  an  M - dimensional  subspace  of  and  let  the  matrices  P,  Q,  Yj^,  and 
Uj^  be  defined  as  in  Section  5.  Suppose  the  sequence  of  N-vectors  } is  contained  in  S and  that  the 
sequence  of  M -vectors,  defined  by  rj^  = ^^^k  uniformly  linearly  independent  with  gap  x- 

In  addition  suppose  that  the  satisfy 

(6.5)  I'^kl  ^ 


for  ||P^Ej^P||p  sufficiently  small,  where  k is  independent  of  k.  Then  there  exists  a positive  constant 
6 such  that  for  any  initial  symmetric  matrix  Eq  satisfying  ||P^EqP||p  < 6 the  sequence  defined  by 
(2.5)  converges  to  a matrix  of  the  form  QU  Q and  the  convergence  rate  is  at  least  R- linear. 


Proof:  The  equation  (6.4a)  for  Vj^  is  identical  in  form  to  that  of  (2.5).  Therefore,  by  Theorem  6.1 
Vk  — 0 at  an  (N-l-x)-step  rate  and  hence  an  R-linear  rate.  Then  by  identifying  with  tj^, 
with  Zj^,  and  Uj^  with  Wj^  it  is  seen  that  the  last  two  equations  in  (6.4)  have  the  form  of  the 
system  (3.1)  and  satisfy  the  hypotheses  of  Theorem  3.2.  The  result  follows.  • 


The  matrix  U'*'  is  not  specified  by  the  theorem.  In  particular  it  is  not  Eq  Q as  in  Theorem  5.1.  It 
should  also  be  observed  that  the  convergence  depends  only  on  the  initial  value  of  the  P P^  Eq  P P^ 
component  of  Eq  . 
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If  the  Sj,  are  not  on  the  subspace  S but  converge  to  it  in  a R- linear  manner,  then  by  using  the 
combination  of  the  analysis  of  Theorem  5.2  with  that  of  the  preceding  theorem  the  following  result  can 
be  obtained. 


Theorem  6.3:  Let  the  subspace  S be  given  and  let  the  matrices  P and  Q be  defined  as  above.  For 
each  k let  s^=  Prj^  -f  tj^  where  { } is  the  sequence  of  M- vectors  defined  by  (5.5)  and  tj^  is 
defined  by  (5.6).  Assume  that  the  { } are  uniformly  linearly  independent  with  gap  x and  that 

(6.6)  |t^|  < |tg|  r'‘ 

for  some  constant  r , 0 < r < 1.  Also  assume  that  the  sequence  { } satisfies 

(6.7)  10,^1  < K(1  + ||P‘E^P||p  + |t,^|-||Ekl|p) 

for  some  constant  k independent  of  k . Let  Eq  be  a given  symmetric  matrix.  Then  there  exist 
positive  constants  rg  = r(  Eq  ) and  S such  that  if  ||  P^  Eg  P ||p -I-  ||  P^  Eg  Q ||  < (5  and  | tg  | < Tq  , the 
sequence  { } defined  by  (2.5)  converges  R- linearly  to  a matrix  of  the  form  QU*Q^. 


Proof:  (6.6)  implies  that  — *■  0 at  an  R- linear  rate  (and  hence  {s|^}  converges  to  S at  an  R- linear 

rate).  Using  (5.2)  we  obtain,  from  (2.5), 

Vk+I  = (I-r|,rp‘)Vp(I-r,rp‘) 


+ ^*(VVk'Uk.Yk’‘k) 


Yk+1  = ( I - 'k'k‘)A-  hi  r-rVr^ ) '<^''‘^k^k)  ^k-  Vk'kl  ['k‘Ykl 

V ^ + "^k  ''k'k)  > 
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-l-  ^k’ k’ ^k’ ^ 


Uu  I 1 — Ui  — 01.1 


1 


t_  _ t 


k+l  - ^k-  '"kl  f:  ty  l^k^k'k^k  + ^k('k’''k’Uk’'^k’‘k) 


Since  the  are  uniformly  bounded,  U^,  and  are  0((||Vj^l|p  -f  ||Uj^||p  -f  |1  Yj^  l|p ) • | | ). 

Using  (6.7)  and  identifying  Vj^  and  Yj^Yj^^  with  Zj^  and  Uj^  with  Wj^  in  (3.1)  we  see  that  the 
hypotheses  of  Theorem  3.2  are  satisfied  and  hence  the  results  follow.  • 


The  hypotheses  of  this  theorem  do  not  require  that  the  component  of  Eq  relative  to  S“*"  be  small 
however  they  do  require  that  the  components  of  { Sj^  } start  and  remain  close  to  S.  An  analysis  of  the 
proof  of  Theorem  3.2  will  show  that  if  the  component  of  Eq  relative  to  S"*"  is  sufficiently  small  then 
the  requirement  that  the  {sj^  } start  close  to  S can  be  relaxed,  although  the  sequence  still  must 
converge  to  S at  an  R- linear  rate. 


7.  Application  to  updates  of 


In  the  previous  three  sections  we  have  established  the  convergence  of  the  sequence  of  matrices  { } 

generated  by  the  system  (2.5)  under  certain  restrictions  on  the  sequences  {sj^}  and  { } • In  this 
section  we  apply  those  results  to  the  sequence  { } generated  by  (1.5).  While  the  results  obtained  so 

far  may  be  of  interest  in  their  own  right,  the  updating  schemes  (1.5)  are  of  practical  interest  in  the 
optimization  algorithms  that  motivated  this  study  and  are,  consequently,  of  more  import.  For  the 
simplest  (quadratic)  case,  when  = Sj^,  the  translation  of  the  major  theorems  of  Sections  4,  5,  and  6 
into  results  for  the  system  (2.2)  can  be  carried  out  by  a straightforward  substitution  of  Hj^  — I for  Ej^. 
The  results  of  this  transformation  are  given  in  the  next  two  theorems  in  condensed  form;  the  first 
theorem  deals  with  the  case  where  the  sequence  { Sj^  } remains  on  the  subspace  S ( which  may  be  3?^ ) 
and  the  second  with  the  case  where  the  { } converge  to  a proper  subspace  S. 


-29- 


Theorem  7.1:  Let  S be  an  M- dimensional  subspace  of  3?^  with  M < N and  let  the  matrices  P and 
Q be  defined  relative  to  S as  in  Section  5 ( P = I and  Q = 0 if  M = N ) . Suppose  each  vector  in  the 
sequence  of  unit  N - vectors,  is  in  S and  suppose  that  the  sequence  { » defined  by  r^^  = 

P^Sj^,  is  uniformly  independent  with  gap  x in  S.  Let  Hq  be  a symmetric  matrix  and  generate  the 
sequence  { } by  (2.2)  . Then 

(i)  if  = 0 for  each  k, 

lim  H.  = QQ''H„QQ‘  + PP‘; 

k-^oo  ^ 

or, 


(ii) 


if  the  <j)^  satisfy  | | < « ( 1 -t-  ||  P^  P ||p  ) 


for  some  « independent  of  k and 


II  P^HqP  — PP^IIp  is  sufficiently  small,  for  some  NxN  matrix  H* 


lim  H,  = -f  PP^ 

k— ♦CO  ^ 

The  convergence  rate  of  the  matrices  is  (N-f-x)'Step  linear  if  S = 3?^  and  R- linear  if  S is  a proper 
subspace  of  3?  . The  uniform  linear  independence  can  be  replaced  in  case  (i)  by  subsequential  linear 
independence,  but  the  R- linear  rate  convergence  is  no  longer  guaranteed. 


Theorem  7.2:  Let  S be  a subspace  of  3?^  and  assume  that  the  sequence  { } satisfies  the  hypotheses 

of  Theorem  6.3.  Assume  that  the  <f)^  satisfy 

I'^kl  < «(i  + IIp‘h^p|If  + Itkl-IIHkllr) 

for  some  k independent  of  k . Let  Hq  be  a given  symmetric  matrix.  Then  there  exist  constants  Tq  = 
r(HQ)  and  6 such  that  if  ||  P^HqP  + P%qQ  — PP^||p<  ^ and  r < Tq  then  the  sequence  { } 
generated  by  (1.5)  satisfies 

lim  H,  = QH*Q‘  + PP*' 

k— *00  ^ 

for  some  (N-M)x(N-M)  matrix  H*  and  the  convergence  rate  is  R- linear.  If  = 0 for  all  k then 
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the  restriction  on  the  initial  matrix  Hq  is  not  required. 


Thus,  generally  speaking,  the  sequence  of  updates  defined  by  (2.2)  converges  to  a matrix  which  is  the 
identity  on  the  subspace  spanned  by  the  sequence  { } (possibly  in  a limiting  sense)  provided  the 
initial  matrix  is  nearly  the  projection  PP^.  In  particular,  the  subspaces  S and  S“^  are  eigenspaces  of 
the  limiting  matrix. 


VVe  now  drop  the  assumptions  that  each  Sj^  is  a unit  vector  and 
each  iteration 


~ \ require  instead  that  at 


(7.1)  + «k 

where  I 0 as  k — ► oo.  As  was  discussed  in  Section  2 this  assumption  is  motivated  by  the 

unconstrained  optimization  problem  where  Sj^  represents  the  difference  in  successive  iterates.  We  then 
have  the  following  theorem. 


Theorem  7.3:  Let  {sj^}  satisfy  the  hypotheses  of  Theorem  6.3.  Let  the  sequence  {yj^}  satisfy  (7.1) 
and  assume  that  — * 0 at  an  R- linear  rate  and  ^ ^ some  r/Q, 

where  0 < 77  < 1 . Assume  that  the  sequence  { } satisfies 

Ukl  < «(i  + IIp‘h^p|If  + (Ukl  + l«kl/l®kl)-li'*kllF) 

for  some  constant  k independent  of  k.  Let  Hq  be  a symmetric  matrix.  Then  there  exist  positive 
constants  6 and  Tq  = ^(Hq)  such  that  for  ||P^HqP  -f-  P^HqQ  — PP^||p<  S,  < t-q  , and 
I tQ  I < Tq  the  sequence  { } generated  by  (1.5),  satisfies 

lim  H,  = QH*Q^  -b  PP^ 
k— ^00  ^ 

for  some  (N-M)x(N-M)  matrix  H*  and  the  convergence  rate  is  R- linear.  If  = 0 for  all  k then 
no  restriction  is  required  on  the  initial  matrix  Hq  . 


Proof:  Let  (7.1)  hold.  Then 
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t 


®k  \ 


and  hence  if  I | is  sufficiently  small 


7.2 


Substituting  (7.1)  and  (7.2)  into  the  update  formula  (1.5),  making  the  changes  of  variables 


= ik. 

k'  i*k 


Sk  = and  = 


and  replacing,  as  before,  Hj^  by  Ej^  -f  I yield  an  equation  of  the  form 


(7.3) 


^k+1  - ^(®k’^k)  “ '^k  ^(®k’^k)  '^k’®k’ ^k’ ^k' 


where  now  |sj^  [ = 1 , and  I | < I hQ  1 • ^ . Using  the  bound  on  the  0.  it  is  seen  that 


(7.4) 


^V(.^,s,E,^)||p  < « (IIEj^llp  )•  U|. 


Now  the  Sj^  can  be  decomposed  as  in  Theorem  6.3  and  the  results  of  Section  3 ( in  particular  Theorem 
3.2)  can  be  applied  just  as  before.  • 


It  may  be  disconcerting  to  observe  that  Theorems  7. 1-7.3  yield  global  convergence  (i.e.,  unrestricted 
initial  matrix)  of  the  { } only  in  the  case  where  4)^  is  zero,  i.e.,  the  L update.  Proposition  2.2 
allows  us  to  extend  these  results  to  the  C updating  formula  for  positive  definite  starting  matrices. 


Theorem  7.4:  If  Hq  is  positive  definite  and  = 1 for  all  k,  then  in  each  of  Theorems  7.1 -7.3  the 
convergence  and  rate  of  convergence  of  the  sequence  { Hj.  } are  independent  of  the  initial  matrix.  In 
the  case  where  the  s^  all  lie  on  a subspace  S and  = Sy,  for  every  k the  uniform  linear  independence 
assumption  can  be  relaxed  to  subsequential  linear  independence  and  the  limit  has  the  form 


QH*Q‘  + PP‘  = [QQ‘Hq'^QQ‘  + pp‘ ]■!. 
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Proof:  The  class  of  positive  definite  matrices  satisfies  the  conditions  of  Proposition  2.2  so  the  sequence 
converges  when  the  C update  is  used.  Since  the  sequence  is  converging  the  iterates  will  eventually  get 
close  enough  to  their  limit  to  assure  that  the  initial  matrix  requirements  of  Theorems  7.1 -7.3  are 
satisfied.  The  desired  convergence  rate  is  then  achieved  . The  final  form  in  the  special  case  derives 
from  the  fact  that  the  inverse  of  the  C update  of  a matrix  is  always  the  L update  of  the  inverse  of  the 
matrix.  If  only  subsequential  linear  independence  of  { Sj^ } is  assumed  in  this  case  then  the  iterates  will 
converge  by  the  argument  of  Proposition  2.2  but  no  rate  of  convergence  is  specified.  • 


In  addition  to  the  class  of  “direct”  Broyden  updates  generated  by  (1.5)  there  is  the  class  of  “inverse” 
Broyden  updates  that  are  obtained  from  (1.5)  by  interchanging  s and  y.  In  the  unconstrained 
optimization  setting  this  class  of  updates  generates  sequences  { Hj^  } that  represent  approximations  to 
the  inverse  of  the  Hessian  matrix  of  the  objective  function.  When  the  y^  = the  resulting  class 

reduces  to  (2.2)  and  the  theorems  of  Sections  4,  5,  and  6 apply.  When  (7.1)  holds,  Theorem  7.3  can 
be  repeated  for  the  “inverse”  class  with  no  essential  change.  Only  the  function  W is  slightly  different, 
although  it  still  satisfies  (7.4). 


When  the  theorems  above  are  applied  to  the  unconstrained  optimization  problem,  it  is  under  the 

assumption  that  the  Hessian  matrix  of  the  objective  function  at  the  optimal  solution  is  the  identity. 

Since  the  form  of  the  Broyden  class  of  updates  (1.5),  as  well  as  the  form  of  the  “inverse”  class,  is 

invariant  under  the  transformations  ( T ) in  Section  2 , there  is  no  loss  in  generality  in  making  this 

assumption.  The  limiting  value  of  the  Hessian  approximations  in  the  untransformed  case  will  be  the 

1/2 

Hessian  matrix  F if  the  vectors  F Sj^ , equivalently  the  vectors  Sj^ , do  not  approach  a proper 

subspace  of  3?^.  If  {F^^^Si  } approax:hes  a subspace  S which  is  an  eigenspace  of  F,  the  matrices  P 

1/2 

and  Q can  be  chosen  so  that  the  columns  are  eigenvectors  of  F ( and,  hence,  F ) . Then 


f'/'[P,Q]  = [P,Q]D  = [P,Q] 

where  and  D2  are  the  diagonal  matrices  whose  diagonal  elements  are  the  square  roots  of  the 

eigenvalues  of  F on  the  subspaces  S and  S"^  respectively.  Now  the  general  form  for  the  limiting 
matrix  gives 


Dj  0 

0 Dj 


lim  H,  = Pp‘  + QU*Q‘ 

k-^oo  ^ 
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so 


lim  H,  = 

k— -'DO 


I 0 
0 U* 


(P,Qi‘F‘^' 


= [P,Q]D 


I 0 

0 u* 


D (P,Q]' 


= PDj2p‘  + QUQ‘ 


for  some  matrix  U.  If  S is  not  an  eigenspace  of  F then  the  columns  of  P and  Q cannot  be  chosen  as 
eigenvectors  of  F and  the  limit  is  F ^ [P  -f  Q U*  Q^]  F ^ . 


Finally,  it  should  be  pointed  out  that  other  rank  two  updates  not  of  the  Broyden  class  may  be  able  to 
be  analyzed  by  the  methods  presented  here.  For  example,  consider  the  well-known  PSB  update  which 
has  the  form: 


t^(H,s,y)  = (y-Hs)st  + s(y-Hs)t  _ (y-  Hs)^s 

s‘s  (s‘s)2 

This  update  is  not  invariant  under  the  transformations  given  above;  however,  we  can  analyze  it  by 
letting  y = F s -f  ^ where  ^ 0 as  above.  Then  letting  E = H — F and  ^ = 0 , we  obtain  the  L 

updating  scheme  (4.1).  Thus  we  can  prove,  under  the  appropriate  restrictions  on  the  that  the 

sequence  { } converges  to  F if  the  { } do  not  converge  to  a subspace.  If  the  { } converge  to  an 

eigenspace,  S , of  F , then  we  obtain 

lim  (%-F)=  QUQ‘ 

k— *-oo 

for  some  matrix  U . Now  since  the  limiting  subspace  of  the  { Sj^  } is  an  eigenspace  of  F, 

F = PDjP‘  + QD2Q'' 

where  and  D2  are  diagonal  matrices.  (As  above,  P and  Q are  chosen  to  have  eigenvectors  of  F 
as  columns).  Thus 

lim  H.  = PD,  P‘  + QU*Q‘ 
k-»oo  ^ 
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for  some  ( N - M ) x(  N - M ) matrix  U*.  This  is  a somewhat  more  general  result  than  is  available  for  the 
Broyden  class  of  updates  because  there  is  no  restriction  that  the  limiting  matrix  F be  positive  definite; 
it  only  need  be  symmetric.  In  the  Broyden  class  of  updates,  the  term  y^s  that  appears  in  the 
denominator  can  be  zero  when  y = F s and  F is  not  positive  definite.  Therefore,  unless  the  lie  on, 
or  converge  to,  a subspace  on  which  F is  positive  definite,  convergence  results  cannot  be  obtained  for 
the  Broyden  class  of  updates  when  F is  indefinite.  The  implications  of  these  remarks  for  constrained 
optimization  algorithms  are  discussed  in  the  next  section. 


8.  Final  comments 


In  this  paper  we  have  analyzed  the  convergence  of  a sequence  of  symmetric  matrices,  { } , satisfying 

rank  two  updating  formulas,  (1.5),  of  the  type  typically  encountered  in  optimization  algorithms.  This 
analysis  was  carried  out  independently  of  the  particular  algorithm  and,  indeed,  independently  of  any 
optimization  problem.  The  results  indicate  that  the  convergence  of  the  matrices  follows  from  the 
secant  condition  (1.2),  the  requirement  that  the  sequence  {sj^}  satisfy  certain  linear  independence 
conditions,  and  the  condition  that  the  sequence  { ~ } approach  zero  at  a specified  rate.  It  does 

not  postulate  that  the  sequence  { } be  related  to  any  optimization  iteration  scheme.  In  addition  to 
establishing  convergence  of  the  sequence  of  matrices  the  results  presented  show  that  the  limiting  matrix*’ 
is  the  identity  matrix  when  restricted  to  the  subspace  to  which  { } converges.  In  optimization 
problems  where  y^^  — F(x*)  Sj^  approaches  zero  and  the  Hessian  matrix  F(x*)  is  positive  definite 
with  the  subspace  as  an  eigenspace,  the  limit  of  the  matrices  is  identical  to  F(x* ) on  the  subspace. 

These  results  do  not  subsume  those  of  earlier  research,  e.g.,  those  contained  in  references  [3]  — [7], 
because  the  conditions  on  the  sequences  { } and  { } are  assumed  here  and  not  derived  as  a 
consequence  of  their  generation  in  optimization  problems.  They  do,  however,  provide  characterizations 
of  the  limiting  matrix  not  given  in  those  works  and,  in  particular,  illustrate  how  its  structure  depends 
upon  the  subspace  that  is  spanned  by  the  sequence  { } , an  important  consideration  in,  for  example, 

constrained  optimization.  Another  distinction  of  the  results  presented  here  is  that  some  analysis  of 
convergence  rates  is  given.  The  convergence  rate  of  these  matrices,  under  the  cissumptions  used,  is 
essentially  R- linear;  thus  it  is  unlikely  to  be  recognized  in  an  algorithm  where  the  iterates  are 
converging  much  more  rapidly,  such  as  in  a Q-superlinearly  convergent  algorithm. 
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One  of  the  original  motives  for  undertaking  this  study  was  to  try  to  find  alternative  proofs  of  the 
superlinear  convergence  of  the  BFGS  and  DFP  algorithms  for  unconstrained  optimization.  Our  efforts 
in  this  regard  will  appear  in  another  report  [10].  In  particular,  we  wanted  to  try  to  simplify  the 
proofs’  dependence  on  the  bounded  deterioration  estimates  and  the  use  of  different  weighted  norms  [ 1 ] . 
The  relation  between  the  superlinear  convergence  of  these  algorithms  and  the  convergence  of  the 
Hessian  approximations  is  best  understood  by  recalling  that  if  the  iterates  { Xj^  } converge  and 
— Xj^  = s-^  , then  the  iterates  converge  superlinearly  if  and  only  if 


(8.1) 


lim  (H^  - I), 

k-»oo  ^ ' 


\ _ 


( We  assume  the  Hessian  is  the  identity  at  the  solution. ) Now  if  the  { Sj^  } converge  to  a subspace  S 
R- linearly  and  the  projections  of  the  Sj^  span  a subspace  S in  the  uniformly  independent  manner 
defined  in  this  paper,  then 


= P(P‘(H^-I)Prk  + 0(?k) 

where  Therefore  by  the  results  of  this  paper  the  convergence  is  superlinear  if  and  only  if 

P^Hj^P  PP^ 

Another  motivation  of  this  research  was  the  desire  to  understand  the  question  of  superlinear 
convergence  for  sequential  quadratic  programming  ( SQP ) methods  for  constrained  optimization.  The 
necessary  and  sufficient  condition  analogous  to  (8.1)  for  superlinear  convergence  is 


(8.2) 


lim 


PP‘(H,  - L),-S-,  = 0 
k-.oo  ''  Isitl 


=k  _ 


where  L is  the  Hessian  of  the  Lagrangian  function  with  respect  to  the  decision  variable  at  the  solution 
and  P P^  is  the  projection  onto  the  null  space,  S,  of  the  active  gradients  at  the  solution.  (See  [11]  or 
[12].)  Note  that  the  restriction  of  L to  the  subspace  S is  positive  definite  by  the  second  order 
sufficient  conditions.  Thus,  letting 


= PP^s,  -f-  QQ*^s,  = r.  -f  V. 


( 8.2 ) becomes 


lim  { PP‘(  H.  - L)PP‘  i + (PP‘(H,  -L)QQ‘)ri  } 
k-^CxD  I I \%\ 


(8.3) 
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It  can  be  seen  that  if  the  sequence  {sj^}  spans  3?^  in  a uniformly  linearly  independent  manner  then 
the  matrices  — L)P  and  P^(Hj^  — L)Q  must  converge  to  zero  if  (8.3)  is  to  hold.  But  L need 

not  be  positive  definite  except  on  the  subspace  S.  Thus  a BFGS  or  DFP  updating  scheme  initiated 
with  a positive  definite  matrix  cannot  be  expected  to  lead  to  a superlinearly  convergent  algorithm 
unless  the  problem  is  convex.  On  the  other  hand,  it  is  seen  that  if  the  sequence  { } spans  S in  a 
uniformly  linearly  independent  manner  and  tends  to  zero  R- linearly  then  the  results  of  this 

paper  imply  that  P P^  ( — L ) P P^  — 0 and  hence  superlinear  convergence  occurs.  The  sequence  of 

vectors  { } tending  to  zero  is  implied  by  the  sequence  of  iterates  { } tending  to  the  solution  in  a 

“tangential”  sense  relative  to  S and  so  this  result  reinforces  and  explains  previous  observations  (see 
[13])  that  tangential  convergence  of  the  iterates  in  an  SQP  algorithm  implies  superlinear  convergence. 
The  remarks  in  Section  7 concerning  the  PSB  update  suggest  that  superlinear  convergence  can  be 
obtained  by  an  SQP  algorithm  employing  that  update,  since  under  the  assumption  of  uniform  linear 
independence  of  the  sequence  the  PSB  updates  converge  to  the  Hessian  matrix  without  the 

assumption  of  positive  definiteness.  Indeed,  Han  in  his  early  work  on  the  SQP  method  [14]  was  able 
to  prove  superlinear  convergence  for  this  update. 


It  would  be  ideal  to  be  able  use  to  the  convergence  analysis  of  this  paper  to  deduce  the  “best”  update 
scheme  for  a given  optimization  algorithm.  This  is  unreasonable,  however,  since  there  are  many  other 
factors  that  influence  a numerical  algorithm  that  are  not  taken  into  account  here.  More  significantly, 
our  analysis  treats  the  sequence  of  steps  {sj^}  as  generated  independently  of  the  sequence 
whereas  in  optimization  algorithms  the  determination  of  is  directly  influenced  by  Hj^ . This 

being  said,  it  is  hoped  that  some  insight  into  the  appropriateness  or  limitations  of  a particular  update 
in  a certain  situation  may  be  gained  from  this  type  of  analysis.  For  example,  the  remarks  above 
suggest  that  the  updating  methods  that  preserve  positive  definiteness,  long  favored  for  unconstrained 
optimization  algorithms,  cannot  guarantee  superlinear  convergence  when  applied  in  SQP  algorithms  for 
constrained  optimization  problems. 


One  intriguing  question  that  has  inspired  many  research  efforts  (e.g.,  [6])  is  to  explain  the 
experimentally  observed  superiority  of  the  BFGS  updating  scheme  over  the  DFP  scheme  in  quasi- 
Newton  algorithms  for  unconstrained  optimization  problems.  In  the  context  of  system  (2.2)  the  BFGS 
update  corresponds  to  the  C update  and  the  DFP  to  the  L update  (where  Hj^  represents  a direct 
approximation  of  the  Hessian  of  the  objective  function ) . If  one  interprets  the  system  (2.2)  as  a local 
approximation  to  the  system  (1.5)  then  one  might  hope  to  observe  better  convergence  results  for  the  C 
update  than  for  the  L update.  Theorem  7.4  suggests,  however,  that  their  theoretical  convergence 
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properties  are  identical  for  positive  definite  initial  matrices. 

Another  way  to  compare  the  local  convergence  properties  of  the  Broyden  class  of  updates  is  in  terms  of 
their  relation  to  the  SRI  update.  As  was  noted  in  Section  2 the  SRI  updating  scheme  used  in  (2.2) 
yields  finite  convergence  in  N steps  ( if  Es  0 at  an  earlier  step)  if  the  sequence  {sj^  Q linearly 

independent,  while  the  other  updates  require  that  the  be  orthogonal  to  obtain  the  same  result.  This 
property  can  be  expressed  in  terms  of  the  system  (2.5)  by  observing  that  if  p Sj^,  and  are  given 
with  Ej^Sj^  =0,  Sjj  = ^®k-l  then,  in  the  notation  of  (6.1), 

^k-fl  ~ ~ ~ ^^®k’^k^  ~ ‘^S^^^k’^k^  ^ *^S  ~ *^k  ^ ^^^k’^k^ 

(8.4) 

= G{v;E^)  - 0s^(v;Ej^)  -h 


where  0g  is  the  SRI  update  parameter.  Since  the  last  term  is  zero  if  it  is  seen  that  the  SRI 

parameter  is  the  only  one  that  causes  the  update  formula  to  be  a function  of  only  the  component,  v , of 
Sj^  perpendicular  to  the  preceding  step,  ^ Thus  we  can  think  of  the  last  term  involving 
as  the  error  in  the  update  that  is  preventing  finite  convergence.  Clearly,  in  terms  of  speeding  up 
convergence  for  system  (2.5)  (and  hence  (2.2) ) it  is  best  to  choose  . If,  however,  it  is  desired 

that  positive  definiteness  of  the  matrices  be  preserved,  then  the  value  0g  cannot  always  be  chosen. 
A reasonable  strategy  in  this  case  might  be  to  choose  the  value  of  as  close  to  <^g  as  possible 
consistent  with  preserving  positive  definiteness,  thus  minimizing  the  error  term  in  (8.4).  From 
Propostion  2.1  it  is  seen  that  positive  definiteness  is  preserved  if  and  only  if  0 < <f>Q  where  </)q  is  a 
constant  greater  than  one  whose  exact  value  depends  on  the  current  s and  H.  For  the  system  (2.2) 


0S 


s^Hs 

s^Hs-  1 


so  that  if  s^  H s < 1 then  0g  < 0 and  hence  positive  definiteness  is  preserved,  while  if  s^  H s > 1 
then  <f>^  > \ and  positive  definiteness  may  or  may  not  be  preserved  depending  on  the  value  of  . 
These  observations  lead  to  the  consideration  of  four  strategies  for  choosing  the  update  parameter  to 
maintain  positive  definiteness  for  the  system  (2.2) : 

Strategy  I:  (pure  DFP)  Choose  </»  = 0; 

Strategy  II : ( pure  BFGS ) Choose  </>=!; 

Strategy  III:  (mixed  BFGS  and  DFP)  If  s^Hs  < 1 choose  0 = 0,  otherwise  choose  0 = 1; 
Strategy  IV:  (mixed  BFGS  and  SRI)  If  s^Hs  < 1 choose  0 = 0g , otherwise  choose  0=1. 
Based  on  the  arguments  given  above,  one  would  expect  Strategy  IV  on  average  to  lead  to  faster 
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convergence  of  the  matrices  than  Strategy  III  and  the  latter  to  be  better  than  either  of  the  strategies  I 
or  II.  Very  limited  numerical  experimentation  on  randomly  generated  problems  has  supported  these 
conjectures.  This  testing  also  gave  an  edge  to  Strategy  II  over  Strategy  I.  A reasonable  explanation 
for  this  is  that  the  eigenvalues  of  the  randomly  generated  initial  matrices  were  preponderantly  greater 
than  one  and,  hence,  so  were  those  of  the  succeeding  matrices.  Therefore,  on  average,  s^Hs  was  more 
often  than  not  greater  than  one  and  thus  the  value  0 = 1 was  closer  to  0g  more  often  than  was  the 
value  0 = 0.  This  situation  would  also  be  expected  to  occur  for  almost  any  realistic  distribution  of 
positive  definite  matrices;  thus  one  should  expect  that  the  strategies  above  are  listed  in  increasing  order 
of  general  effectiveness.  In  particular,  the  BFGS  strategy  should  generally  outperform  the  DFP 
strategy  when  there  are  numerous  large  eigenvalues  (cf.,  [6]).  Presumably,  the  best  strategy  would  be 
to  choose  0 as  close  to  the  upper  bound  0^  as  possible  whenever  0g  is  greater  than  0q  and  0 = 0g 
otherwise.  Whether  or  not  this  is  computationally  attractive  is  not  clear. 


It  should  be  emphasized  that  the  above  remarks  apply  only  to  the  system  (2.2)  and  not  to  the  system 
(1.5).  Further  experimentation  is  necessary  to  determine  if  these  conjectures  can  be  validated  for  the 
latter  system  and  applied  in  a useful  way  to  quasi-Newton  algorithms  for  unconstrained  optimization. 


A generalization  of  the  results  of  this  paper  that  also  might  yield  interesting  insights  would  be  the 
relaxation  of  the  secant  condition  (1.2).  It  seems  clear  that  the  analysis  carried  out  here  could  also 
work  if  another  term  were  added  to  (2.5),  say  U(Sj^,Ej^),  with  U Sj^  converging  to,  but  not  equal  to, 
zero.  Such  a class  of  updates  satisfying  a type  of  asymptotic  secant  condition  would  yield  the  same 
convergence  results  and,  by  the  remarks  above,  preserve  superlinear  convergence.  Such  updates  could 
potentially  broaden  the  choices  of  updating  schemes  available  for  use  in  special  classes  of  problems. 


A final  question  that  is  unanswered  by  this  analysis  is  if  the  global  convergence  properties  enjoyed  by 
the  L and  C updates  when  Sj^  = y^^  can  be  extended  to  other  updates.  In  particular,  it  would  be 
reasonable  to  expect  that  the  restricted  Broyden  class  of  updates  would  have  this  property. 
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Api)endix 


Here  we  address  the  proofs  of  the  theorems  in  Section  3.  We  will  not  prove  the  theorems  stated  there 
in  their  full  generality.  Rather  we  will  sketch  the  proof  of  Theorem  3.1  and  prove  one  theorem  from 
which  the  proof  of  Theorem  3.2  can  be  derived  by  generalizing  the  arguments. 


For  Theorem  3.1  we  have  two  uncoupled  systems 


and 


\+l  - 

"k+l  = "k  + Wk(tk) 


where,  because  of  the  R- linear  convergence  of  the  { }, 


(A.l) 


‘kl  ^ 


for  some  r,  0 < r < 1,  and  some  tg.  For  the  first  equation  it  can  be  shown  by  induction,  using  the  fact 
that  1 1 A.  1 1 <1  for  all  i , 

l"k+rl  ^ ( II  H (A^+j)  ID  l^kl  + ''I'E  l‘k+jl- 

j=0  j=0 


(A.2) 


Assuming,  without  loss  of  generality,  that  •'j+i  > kj  -f  m,  condition  (ii)  of  Theorem  (3.1)  implies 
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l^k  I < /?  |zu  I + K-rj  . 

"j  + l ‘'j 


^1*1  I 

where  K = ^ — . Let  { -Cj  } satisfy 


(A.3) 


?j+l=  /3«j+ 


with  ^0  “ I'  j’  l^k.  I — ^j'  solution  to  (A.3)  satisfies 


^0  --J 


+ K-  Z K-j] 

J i=:l 


where  p = max  [jS , t].  So  from  (A. 2) , for  kj  < k < ‘‘j+i’ 

k. 

|Zkl  ^ l^k  I ^ I 

K Kj  Kq 

Thus  {zj^}  --*•  0.  If  condition  (ii)^  of  Theorem  3.1  holds,  then  without  loss  of  generality,  we  can 
assume  that  it  holds  for  all  k.  Then  for  k = j • m -f  r,  0 < r < 1,  we  have  from  (A. 2) 


^k  ' '^r+j-m'  = ^ ' ^r-|-(j-l)'m 


< Iz 


4-  K-r 


r4-(j-l)-m 


It  can  now  be  established  by  induction  on  j that  for  each  r,  0 < r < m , 


z . . I < Kf  p 
r4-j*m  ' — ^ ^ 


where  p = m€Lx[/?,r].  The  R- linear  convergence  follows.  If  = 0,  then  for  each  k 


l\+ml  ^ l^ki 

which  gives  the  m-step  convergence. 

For  the  second  system,  to  show  that  the  {wj^}  converge  to  some  vector  R- linearly  we  note  that  for 
any  positive  integers  k and  i, 

j=0 
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and  hence  { } is  a Cauchy  sequence  converging  to  some  w*.  Since 

|w*-Wki  = |i^Kk+i“’^kl  ^ ^ 

it  follows  that  the  convergence  is  R- linear. 

To  show  how  Theorem  3.2  can  be  derived  we  consider  the  system  of  coupled  difference  equations  in  two 
variables: 

"k+l  = + *k)-‘k 

(A.4) 

'^k+l  = "'k  + ^2(^k)^  + ^2-(^k  + ''k)-‘k 

where  0<q:<1,  /?p  /?2»  7p  and  72  nonnegative  constants,  and  { } converges  to  zero  R- 
linearly,  i.e.,  (A.l)  holds. 


Theorem  A.l:  There  exists  positive  constants  and  Tq  such  that  if  |zq|  < and  | tg  | < Tq  then 
I Zjj  I < I Zq  I and  | | < 3 • | Wq  | for  all  k . Moreover,  there  exists  a w*  such  that 

{^ki  ^ 0 

and 

{Wk}  - w* 

R- linearly  . 


Proof:  First  we  show,  by  induction,  that  the  hypotheses  imply  that  | 1 < 3 • | Wq|  and  that  there  is 

a p,  0 < p < 1 , such  that 

l^kl  < hoi  p'"- 

We  let  /?  = max  { /?.,  /?2  } and  y = max  { 70  } 


(i) 


d = (1  + a)/2, 
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(ii) 

(iii) 

(iv) 


P = 


max  {or  , r ) , 


?o= 


( 1 — P“).l/2 


1} 


. , (1  - /’)|zol  (1  - r)  , 

"0=  -"‘"f  SIwqIt  ’ -4^>- 


Clearly  the  inequalities  are  true  for  k = 0.  Assume  them  to  be  true  for  j = 0, 1,  . . . , k 
using  (i),  (iii),  and  the  induction  hypotheses 

|zk+ll  ^ hkl  + ^l^kl'I^k* 


< a |zq|p^  -h  4 


Using  (ii)  and  (iv)  we  have  (since  k > 1 ) 


(A.5) 


^k+1 


p(  1 - p)  |z 


0 


} < IzqI  ^ 


k-Hl 


Then  for  each  j , j = 0, . . . , k , we  have 


< |wj|  + /^IzqI^p^-^  4-  Tdz^lp-^  -h  SlwgDItglp 


2j 


(A.6) 


< I wj  + { I I ( 1 - p^  ) + 4 7 I Wq  I Tq  } p 


2J 


< |wj|  + 2|wq|(1-p“) 


Clearly,  the  sequence  { | Wj  | } is  majorized  by  the  sequence  { Cj  } satisfying 

Cj+l  = fj  + 2Co(l-p2)p2j,  Co  = l*ol 


for  j = 0, . . . , k.  But  then 


Ck+i  = Co  + E [Cj+i  - Cjl 

j=o  •' 


Cq  “b  “ ^0  ^ ^ ~ ^ p ^ ^0 

j=0 


Then 


Thus 
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Wk+il  < 3- 


W/ 


which  completes  the  induction  step.  Now  it  follows  from  (A. 5)  that  { } — *■  0 at  an  R- linear  rate. 
It  remains  to  show  that  the  { } converge  to  some  vector  R- linearly.  From  the  difference  equation 
for  the  { } and  the  estimates  in  (A. 6)  it  is  seen  that  for  any  positive  integers  k and  i, 


Wi  , . — Wi 


k-fi 


< 2lw.|(l-p2)  < 2|w.|p2k 


j=0 


and  hence  { Wj^  } is  a Cauchy  sequence  converging  to  some  w*.  Since 


" ~ "kl  = l'"k+i  - "k 


it  follows  from  the  above  inequality  that  the  convergence  is  R- linear. 


Theorem  3.2  can  now  be  proved  by  generalizing  the  above  theorem  to  the  case  where  and  w^^  are 
vectors  and  the  condition  0 < o < 1 is  replaced  by  conditions  (i)  and  (ii)^  of  Theorem  3.1. 
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